Spatial Feature Clustering — DLPFC-151673
Sample Tissue — DLPFC-151673
H&E stained tissue section from the Visium slide. This image identifies the analyzed sample and provides anatomical context for interpreting the spatial expression patterns discovered by the clustering pipeline.

Note — Gene-Level Clustering
This pipeline clusters genes into co-expression modules based on their spatial expression patterns across all Visium spots. This is fundamentally different from 10x Genomics Space Ranger, which clusters spots (tissue regions / cell types). The gene modules identified here represent groups of genes with similar spatial expression structure, not tissue domains.
Executive Summary
This report summarizes the results of a full spatial feature clustering pipeline run on 10x Genomics Visium transcriptomics data. The pipeline clusters genes (not spots/cells) into co-expression modules by selecting spatially variable genes, computing multiple similarity representations (expression, spatial, MoG, weighted), clustering genes under each representation, and evaluating cluster quality using both internal metrics and spatial coherence.
Pipeline execution completed: 6/6 notebooks succeeded. ALL PASSED
Optimal Parameters (Sensitivity Analysis)
The parameters below were identified through systematic grid search over the weight space (α + β + γ = 1) and resolution range. The optimal combination maximizes the Silhouette score of the weighted clustering.
Configuration
dataset_path: data/DLPFC-151673
clustering:
method: louvain
resolution: 0.8
random_state: 0
optimize_resolution: False
resolution_range: [0.3, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.5, 2.0]
similarity:
weights:
alpha: 0.0
beta: 0.4
gamma: 0.6
preprocessing:
n_top_genes: 300
min_gene_expression: 300
n_top_genes_hvg: 3000
pca_components: 50
evaluation:
n_neighbors: 6
use_pca_for_coherence: True
_updated:
session: run_2026-02-10_04-27-21
date: 2026:02:10 04:27:21
note: Auto-updated from sensitivity analysis (notebook 06)
Execution Status
| Notebook | Status |
|---|
01_explore_dataset.ipynb | SUCCESS |
02_baseline.ipynb | SUCCESS |
03_spatial_weighted_similarity.ipynb | SUCCESS |
04_multiview_clustering.ipynb | SUCCESS |
05_final_plots.ipynb | SUCCESS |
06_sensitivity_analysis.ipynb | SUCCESS |
Baseline Metrics
Baseline clustering uses expression-only similarity matrices (Pearson, Spearman, Cosine) without any spatial filtering. These metrics serve as the reference point against which spatially informed clusterings are compared.
What is Silhouette?
The Silhouette Score measures how similar each gene is to its own cluster compared to other clusters. Values range from −1 (wrong cluster) to +1 (well-matched). A score near 0 indicates overlapping clusters.
What is Calinski Harabasz?
The Calinski-Harabasz Index (Variance Ratio Criterion) is the ratio of between-cluster dispersion to within-cluster dispersion. Higher values indicate denser, better-separated clusters.
What is Davies Bouldin?
The Davies-Bouldin Index measures the average similarity between each cluster and the one most similar to it. Lower values indicate better separation.
| silhouette | calinski_harabasz | davies_bouldin |
|---|
| pearson | 0.1102 | 17.2363 | 2.3873 |
| spearman | -0.2524 | 8.3527 | 3.2220 |
| cosine | 0.1183 | 17.8707 | 2.3529 |
Multi-View Comparison
The multi-view analysis clusters genes under four different similarity representations and then measures pairwise agreement. High ARI/NMI between views confirms that the core gene modules are robust; divergences highlight genes whose grouping is sensitive to spatial context.
Adjusted Rand Index (ARI) Matrix
The Adjusted Rand Index (ARI) quantifies the agreement between two clusterings, adjusted for chance. Values range from −1 (worse than random) through 0 (random) to +1 (perfect agreement). An ARI ≥ 0.8 is generally considered strong agreement.
| expression | spatial | mog | weighted |
|---|
| expression | 1.000 | 0.871 | 0.934 | 0.934 |
| spatial | 0.871 | 1.000 | 0.909 | 0.934 |
| mog | 0.934 | 0.909 | 1.000 | 0.973 |
| weighted | 0.934 | 0.934 | 0.973 | 1.000 |
Normalized Mutual Information (NMI) Matrix
The Normalized Mutual Information (NMI) measures the mutual dependence between two clusterings, normalized to [0, 1]. A value of 1 means the clusterings are identical; 0 means they share no information.
| expression | spatial | mog | weighted |
|---|
| expression | 1.000 | 0.819 | 0.892 | 0.892 |
| spatial | 0.819 | 1.000 | 0.848 | 0.893 |
| mog | 0.892 | 0.848 | 1.000 | 0.941 |
| weighted | 0.892 | 0.893 | 0.941 | 1.000 |
Cluster Summary
Summary of the final weighted clustering. Size is the number of genes assigned to each cluster. Spatial Coherence (Moran's I) quantifies how spatially structured the average expression profile of each cluster is on the Visium tissue.
What is Moran's I?
The Moran's I statistic measures spatial autocorrelation — the degree to which nearby spots on the Visium slide share similar gene-expression patterns. Values near +1 indicate strong spatial clustering; near 0 indicates randomness; near −1 indicates dispersion.
| Size (# genes) | Spatial Coherence (Moran's I) |
|---|
| 0 | 126.0000 | 0.9234 |
| 1 | 174.0000 | 0.6728 |
Sensitivity Analysis
Sensitivity analysis verifies that the chosen parameters are robust. If small changes to weights or resolution cause large shifts in cluster assignments (low ARI), the result is fragile. Stable, high ARI across a wide parameter range indicates a trustworthy clustering.
Top 5 Weight Combinations (by Silhouette)
| α | β | γ | Silhouette | ARI vs. baseline | # Clusters |
|---|
| 0.0 | 0.4 | 0.6 | 0.125 | 0.987 | 2 |
| 0.6 | 0.2 | 0.2 | 0.125 | 0.987 | 2 |
| 0.2 | 0.2 | 0.6 | 0.125 | 0.987 | 2 |
| 0.4 | 0.2 | 0.4 | 0.125 | 0.987 | 2 |
| 0.0 | 0.5 | 0.5 | 0.124 | 1.000 | 2 |
Saved Data Arrays
Summary statistics for all numpy arrays stored during the pipeline run. These files contain raw similarity matrices, cluster label vectors, selected gene indices, and intermediate results.
| Name | Shape | Dtype | Min | Max | Mean |
|---|
baseline_labels_cosine | (300,) | int32 | 0.0000 | 1.0000 | 0.5967 |
baseline_labels_pearson | (300,) | int32 | 0.0000 | 1.0000 | 0.5800 |
baseline_labels_spearman | (300,) | int32 | 0.0000 | 2.0000 | 1.1267 |
baseline_similarity_cosine | 300 x 300 | float32 | 0.0000 | 0.9292 | 0.2287 |
baseline_similarity_pearson | 300 x 300 | float64 | -0.2350 | 0.9017 | 0.0773 |
baseline_similarity_spearman | 300 x 300 | float64 | -0.2673 | 0.7246 | 0.0630 |
baseline_top_genes | (300,) | int64 | 27.0000 | 33494.0000 | 16869.6200 |
cluster_labels | (300,) | int32 | 0.0000 | 1.0000 | 0.5800 |
cluster_labels_expression | (300,) | int32 | 0.0000 | 1.0000 | 0.5967 |
cluster_labels_mog | (300,) | int32 | 0.0000 | 1.0000 | 0.5800 |
cluster_labels_spatial | (300,) | int32 | 0.0000 | 1.0000 | 0.5633 |
cluster_labels_weighted | (300,) | int32 | 0.0000 | 1.0000 | 0.5800 |
similarity_expression | 300 x 300 | float32 | 0.0000 | 0.9292 | 0.2287 |
similarity_matrix | 300 x 300 | float64 | 0.0000 | 0.9513 | 0.3855 |
similarity_mog | 300 x 300 | float64 | 0.0000 | 0.9747 | 0.3125 |
similarity_spatial | 300 x 300 | float64 | 0.0000 | 0.9950 | 0.6956 |
similarity_weighted | 300 x 300 | float64 | 0.0000 | 0.9513 | 0.3855 |
top_genes | (300,) | int64 | 27.0000 | 33494.0000 | 16869.6200 |
top_genes_multiview | (300,) | int64 | 27.0000 | 33494.0000 | 16869.6200 |
Visualizations
All visualizations generated during the pipeline run are collected below. Each figure is accompanied by an explanation of what it shows and how to interpret it.
Notebook 01 — Data Exploration
Distribution of Total Counts per Gene
Histogram of the total counts aggregated per gene across all spots. Most genes have very low total counts; a small number of highly expressed genes dominate. This motivates filtering to the top spatially variable genes.

Distribution of Total Counts per Spot
Histogram of the total UMI (unique molecular identifier) counts per spot. Spots with very low counts may indicate empty or low-quality capture areas. A unimodal distribution with a long right tail is typical for Visium data.

Distribution of Detected Genes per Spot
Histogram of the number of distinct genes detected (count > 0) per spot. This is a key QC metric — spots with unusually few detected genes may lie outside the tissue or suffer from capture failure.

Gene Diagnostic (Exploration) — 17754
Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.

Gene Diagnostic (Exploration) — 22295
Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.

Gene Diagnostic (Exploration) — 22690
Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.

Gene Diagnostic (Exploration) — 27957
Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.

Gene Diagnostic (Exploration) — 6611
Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.

Known Marker Gene — MOBP
Diagnostic plot for a known marker gene (e.g., MOBP for oligodendrocytes in DLPFC). The spatial pattern should match known tissue layer anatomy, serving as a sanity check for data quality and spatial alignment.

Tissue & Spot Layout
Visium spots overlaid on the H&E tissue image, colored by total UMI counts per spot. Brighter spots indicate higher sequencing depth. This overview confirms spot coordinates and reveals the overall tissue morphology alongside the expression intensity landscape.

Notebook 02 — Baseline Clustering
Baseline Cluster Representative — 0 gene 6546
Diagnostic plot for a representative gene from a baseline (expression-only Pearson) cluster. The spatial pattern illustrates the type of expression structure captured without any spatial filtering.

Baseline Cluster Representative — 1 gene 12035
Diagnostic plot for a representative gene from a baseline (expression-only Pearson) cluster. The spatial pattern illustrates the type of expression structure captured without any spatial filtering.

Baseline Similarity Matrices (Pearson / Spearman / Cosine)
Heatmaps of the three expression-only gene-gene similarity matrices. Pearson captures linear correlation, Spearman captures rank-order correlation, and Cosine measures directional similarity. Block structure indicates gene modules detectable from expression alone.

Notebook 03 — Spatial Weighted Similarity
Weighted Cluster Representative — 0 gene 12035
Diagnostic plot for a representative gene from the spatially weighted clustering (NB03). Compared to baseline, the MoG panel should show cleaner spatial domains thanks to the inclusion of spatial and MoG similarity components.

Weighted Cluster Representative — 1 gene 23765
Diagnostic plot for a representative gene from the spatially weighted clustering (NB03). Compared to baseline, the MoG panel should show cleaner spatial domains thanks to the inclusion of spatial and MoG similarity components.

Spatial Weighted Similarity Matrix (NB03)
Heatmap of the combined similarity matrix computed with weights α·Expr + β·Spatial + γ·MoG in notebook 03. This is the primary similarity used for the initial weighted clustering.

Notebook 04 — Multi-View Clustering
ARI / NMI Inter-View Comparison (NB04)
Pairwise comparison of the four clustering views (expression, spatial, MoG, weighted) using ARI (left) and NMI (right). High off-diagonal values mean the two views largely agree on gene groupings; lower values reveal genes reclassified when spatial or MoG information is introduced.

View-Switching Gene (NB04) — 12035
Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.

View-Switching Gene (NB04) — 23259
Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.

View-Switching Gene (NB04) — 28749
Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.

View-Switching Gene (NB04) — 9104
Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.

View-Switching Gene (NB04) — 9816
Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.

Multi-View Representative — expression gene 6546
Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.

Multi-View Representative — mog gene 12035
Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.

Multi-View Representative — spatial gene 12035
Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.

Multi-View Representative — weighted gene 12035
Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.

Notebook 05 — Final Plots
Similarity Matrices Overview (Publication)
Side-by-side heatmaps of the four gene-gene similarity matrices used in multi-view clustering: Expression (Pearson on raw counts), Spatial (after mean filter), MoG (binarized), and Weighted (α·Expr + β·Spatial + γ·MoG). Block-diagonal structure indicates clear gene modules.
